Assessing the relevance of node features for network structure 
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Networks describe a variety of interacting complex systems in social science, biology and infor- 
mation technology. Usually the nodes of real networks are identified not only by their connections 
but also by some other characteristics. Examples of characteristics of nodes can be age, gender or 
nationality of a person in a social network, the abundance of proteins in the cell taking part in a 
protein-interaction networks or the geographical position of airports that are connected by directed 
flights. Integrating the information on the connections of each node with the information about 
its characteristics is crucial to discriminating between the essential and negligible characteristics of 
nodes for the structure of the network. In this paper we propose a general indicator O, based on 
entropy measures, to quantify the dependence of a network's structure on a given set of features. 
We apply this method to social networks of friendships in US schools, to the protein-interaction 
network of Saccharomyces cerevisiae and to the US airport network, showing that the proposed 
measure provides information which complements other known measures. 

PACS numbers: 



Networks have become a general tool for describing 
the structure of interaction or dependencies in such dis- 
parate systems as cell metabolism, the internet and soci- 
ety [lljSMfllHl • Loosely speaking, the topology of a given 
network can be thought of as the byproduct of chance 
and necessity ^ , where functional aspects and structural 
features are selected in a stochastic evolutionary pro- 
cess. The issue of separating "chance" from "necessity" 
in networks has attracted much interest. This entails un- 
derstanding random network ensembles (i.e. chance) and 
their inherent structural features 0, S, but also devel- 
oping techniques to infer structural and functional char- 
acteristics on the basis of a given network's topology. Ex- 
amples go from inference of gene function from protein- 
interaction networks [ l O | to the detection of communities 
in social networks [lllllj]. Community 34 1 detection, for 
example, aims at uncovering a hidden classification of 
nodes, and a variety of methods have been proposed re- 
lying on structural properties of the network (between- 
ness centrality (13jl, m odularity |14l | , spectral decomposi- 
tion cliques |16l a nd hierarchical structure [l3|), H) 
statistical methods [18] or Hi) on processes defined on the 
network [^, [l9|] . Implicitly, each of this method relies on 
a slightly different understanding of what a community 
is. Furthermore, there are intrinsic limits to detection; 
often the outcome depends on the algorithm and a clear 
assessment of the role of chance is possible in only a few 
cases (see e.g. (ol. [20|V 

As a matter of fact, in several cases a great deal of 
additional information, beyond the network topology, is 
known about the nodes. This comes in the form of at- 
tributes such as age, gender and ethnic background in 
social networks, or annotations of known functions for 
genes and proteins. Sometimes this information is incom- 
plete, so it is legitimate to attempt to estimate missing 



information from the network's structure. But often the 
empirical data on the network are no more reliable or 
complete than those on the attributes of the nodes. In 
such cases, it may be more informative to ask what the 
functions or attributes of the nodes tell us about the net- 
work than the other way around. In this paper we pro- 
pose an indicator Q that quantifies how much the topol- 
ogy of a network depends on a particular assignment of 
node characteristics. This provides an information bound 
which can be used as a benchmark for feature extraction 
algorithms. This exercise, as we shall see, can also reveal 
statistical regularities which shed light on possible mech- 
anisms underlying network's stability and formation. 

In the following, we first define 0, then we investigate 
separately the case in which node characteristic assign- 
ment induces a community structure on the network, and 
the case in which the assignment corresponds to a posi- 
tion of the nodes in some metric space. We will calculate 
O for benchmarks and for examples of social, biological 
and economics networks. 



DEFINITION OF 6 

We shall first give a description of our indicator O in 
a simple case study and then give a general abstract def- 
inition. 

Let us consider the specific problem of evaluating the 
significance of the network community structure q = 
{qi, . . . ,qN) induced by the assignment of a character- 
istic qi £ {!,..., Q}, to each node i £ {!,..., N} of a 
network of N nodes. Individual nodes are characterized 
by their degree ki , which is the number of links they have 
to other nodes in the network. The network g is fully 
specified by the adjacency matrix taking values gi,j = 1 
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if nodes i and j are linked and otherwise. The com- 
munity structure induced by the assignment qi on the 
network is described by a matrix A of elements A{q,q') 
indicating the total number of links between nodes with 
characteristics q and q'. A natural measure of the signif- 
icance of the induced community structure q on the net- 
work g is provided by the number of graphs g' between 
those individual nodes (characterized by the degree se- 
quence k) which are consistent with A. The logarithm of 
this number is the entropy Eg - 2l|, |22| of the distribu- 
tion which assigns equal weight to each graph g with the 
same q and k. This number also depends on the degree 
sequence k and the relative frequency of different values 
of q across the population. These systematic effects are 
removed considering the entropy Eg obtained from a 
random permutation TT{q) : i — > g^f^) of the assignments, 
where {Tr(i),i = 1, . . . , N} is a random permutation of 
the integers i S {!,..., N}. The indicator O is obtained 
as the standardized deviation of Eg - from the entropy 



of networks with randomized assignments: 



(1) 



where Ej^ [. . .] stands for the expected value of over ran- 
dom uniform permutations 7r(g) of the assignments. In 
words, measures the specificity of the network g for 
the particular assignment with respect to assignments 
obtained by a random permutation. 

The indicator Q can be similarly defined in a much 
more general setting, with the following abstract defini- 
tion: Let 5 € t/jv be the network we are interested in, 
where N is the number of vertices and gij is the adja- 
cency matrix, t/^v is the set of all graphs of N vertices. An 
assignment is a vector q, such that for each node i,qi £ Q 
is defined on a set Q of possible characteristics, given by 
the context. Call Q — the set of all possible such vec- 
tors on Q. A feature is a mapping : ^at x Q ^ $, which 
associates to each graph g and assignment q a graph fea- 
ture (j){g, q) £ ^. As will become clear, we do not need 
any assumption about the topology of the set of features 

A simple example of features is those which do not 
depend on any assignment {4>(g,q) = 4>(.g))i such as the 
number of edges, or the degree sequence. Instead, the pre- 
viously introduced community structure A is an example 
of a feature depending both on the degree sequence k and 
on the assignment g, i.e. 4>[g,q) = {k, A{q,q'), q,q' G 

In order to assess the relevance of a feature (j>{g,q), 
we make use of the entropy E0(g of randomized net- 
work ensembles [2l|, [i^] ■ The entropy of the ensemble of 
graphs with feature (/^{g, q) is defined as the normalized 
logarithm of the number of possible graphs, consistent 



with (/)((7, q) and normalized by iV:[3 
1 



<P{g\q)^cty{g,q)}\ . (2) 



This quantity evaluates the level of randomness that is 
present in the ensemble of networks with a given fea- 
ture. The numerical evaluation of the entropy E^j^^^ is 
a very challenging problem. On the contrary this quan- 
tity can be theoretically calculated by introducing a par- 
tition function in a statistical mechanics formalism and 
evaluating it by saddle point approximation (sec the Sup- 
plementary Materials for the equations and the codes for 
the evaluation of E ). Finally, with the same notations 
used above, the indicator O is defined as 
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The quantity Q provides a measure of the relevance of 
a given feature 0(5, q) for the structure of the network. 
While E0(g can be obtained in analytic form, the aver- 
age and the standard deviation over permutations require 
a random sampling of the space of possible permutations 
of the characteristics. In practice, iVgamp random permu- 
tations are drawn in order to estimate the expected value 
and the variance of S^^.^ T^(q)) "'^'1' ® ' Furthermore, the 
maximal deviation of E^^jt^j-^j from the expected value 
provides an estimate of the confidence interval at proba- 
bility p = l/Nse,-mp.^\ 

Besides the value of 0, our approach also provides 
more detailed information. Technically, this is extracted 
from the saddle point values of the Lagrange multipliers 
introduced in the calculation of E0(g in order to enforce 
the constraints (see Supplementary Material). In the ex- 
amples discussed below, this information is encoded in 
the probability that a node i is linked to a node j in an 
ensemble with a given feature 4>{g,q)- This is given by 
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ZiZjW{qi,qj) 
1 -I- ZiZjW{qi, qj) 



(4) 



The value of the "hidden variables" 



z and the statistical 



weight W{q, q') can be inferred from the real data [21 
Therefore the function W{qi,qj) can shed light on the 
dependence of the probability of a link between nodes i 
and j, on their assignments qi and qj. 



APPLICATION TO NETWORKS WITH A 
COMMUNITY STRUCTURE 

In the following, we will describe how to measure O for 
assessing the relevance of a community structure. First, 
we analyze the behavior of O on synthetic data-sets. 
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FIG. 1: The dependence of O on kout in random networks of 
= 128, 256, 512 nodes with four equal size communities 
and average connectivity = 16 (each point is obtained as an 
averages over 10 realizations). To be compared with Fig. 7 in 
[13 and Fig. 10 in [I]. 



degree kout towards different communities. 

The results are shown in Fig. [T] This shows that, for a 
fixed structure, 8 for different values of iV nicely collapses 
on a single curve when rescaled by the factor ViV. This 
suggests that the size dependence of results from the 
random fluctuations of the intensive quantity S0(g 
Hence the same scaling is expected in general, in not too 
heterogeneous systems, [s^ 

Secondly, Fig. [1] shows that vanishes only when there 
is no distinction in linking probabilities: with 4 groups 
this occurs when kout — jk = 12, which is larger than 
the value (kout ~ 8) where community detection algo- 
rithms fail @, Indeed, community detection can be 
seen as the inverse problem to that addressed here. In this 
spirit, Q provides an a-priori bound on the possibility of 
detecting communities in networks, as well as a universal 
indicator of the performance of different algorithms. 



The dataset of friendship networks in US schools 



These have been used as benchmarks for community de- 
tection algorithms [^, [13] . For these benchmarks we find 
that Q increases with the number N of vertices, reflect- 
ing the intuitive idea that larger graphs can resolve finer 
information on the global architecture of the network. 
We shall see that even in the region where community 
detection algorithms fail, there is a detectable influence 
of community structure on the topology of the network. 
Next, we apply this tool to a social network and a biolog- 
ical network. In particular, we will consider a dataset of 
friendship networks in US schools and a network of high 
confldence protein-protein interaction [2^. The dataset 
of friendship networks in US schools, which includes 84 
schools, is particularly suitable for contrasting the infor- 
mation gained from to that derived from other indica- 
tors, such as modularity [l4[. We will show that, at least 
in this case study, the information provided by is of a 
different nature and more detailed than that provided by 
other measures. 

As discussed above, in this section we shall take qt G 
{!,..., Q} to be the label of the class which node i 
belongs to, with Q < \/N.\3^ The feature 4>{g,q) = 
{fc, A{q, q')} specifies the degree sequence k and the num- 
ber A{q, q') of links between nodes in communities q and 
q'. Finally, we calculate the indicator Q defined in Eq. 
Q for the different cases. 



We apply our method to a dataset of 84 US schools in 
which students were asked to provide information about 
themselves (among other things specifying in particular 
sex, age, and ethnic background) together with the names 
of up to 5 of their female friends and up to 5 of their male 
friends. Although the networks are directed in origin, in 
our analysis, in order to simplify study, we consider them 
as undirected, where each undirected link is present if at 
least one of the two students has indicated the linked one 
as his/her friend. The maximal connectivity of these net- 
works is A: = 16, reached only in rare cases. This dataset 
is particularly interesting for the study of homophily in 
American schools [H HE [Hi • 

We have measured the value of O associated with the 
community structure induced by the self-reported ethnic 
background (there were six possibilities in the question- 
naire: Q = 6) in all the 84 schools of the dataset. Loosely 
speaking, in this case measures the extent to which 
ethnic background shapes the social network of friend- 
ship in US schools. 

For 25% of the schools we find that Q is not significa- 
tive, at the 5% confidence level. For the rest of the data 
set, Q takes widely scattered values across schools (up to 
Q ~ 532). In order to asses how much the variation in Q 
correlates with ethnic diversity we take, as a measure of 
diversity in the assignment q, the Shannon entropy 



Evaluation of 6 on benchmarks 



s 



Xq \0g{Xq) 



(5) 



We evaluate Q on the benchmark random networks, 
originally proposed in Refs. 0, [13] , of = 128, 256, 512 
nodes divided into four communities of equal size with 
fixed average connectivity k = 16, varying the average 



where Xq is the fraction of nodes with qi = q. We remark 
that the Shannon entropy of a partition measures the 
diversity in the population but does not contain any in- 
formation on the social network. 



FIG. 2: Relation between 0/\/7^, which captures the rele- 
vance of ethnic background for friendship networks, and the 
Shannon entropy S, indicating the ethnic diversity of the stu- 
dent population. Each point corresponds to a different US 
schools in the data-set. Error bars indicate 5% confidence in- 
tervals. 
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FIG. 3: The value of Q/y/N versus the modularity M for 
the dataset of friendship networks in American Schools. Each 
point is a school. 



In Fig. [5] we report the dependence of 8 on S. We 
observe that the value of Q/^/N is small and not statis- 
tically significant in ethnically uniform schools {S < 0.3) 
but it grows larger and significant for schools with a 
stronger diversity. The largest values of 8, as well as 
the largest spread, occur for intermediate values of S 
(0.4 -;- 0.5), suggesting a nontrivial dependence. In order 
to asses the statistical relevance of this result, we have 
studied the dependence of 8 on S" in benchmark synthetic 
networks, such as the ones presented above, where the 
fraction of links within the community of each individual 
is kept constant, but the relative sizes of communities 
are varied. A much weaker, barely significant increase of 
8 with S was found in synthetic networks, hinting that 
a non-trivial interplay between homophily and diversity 
might be responsible for the features observed in Fig. [2l 
A popular measure for community structure, fre- 
quently used in the literature, is modularity, which is 
closely related to inbreeding homophily indices in social 
27 1 and the F-statistic in genetics 



sciences 



Modularity M measures how densely connected the 
nodes that belong to the same partition are. It is defined 



Q 



q=l I 
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(6) 



where Q is the total number of communities or classes, 
L is the total number of links, Iq is the total number of 
links joining nodes of community q and kg is the sum of 
the degrees of the nodes in the community q. 

Fig. [3] reports the value of Q/VN versus the value of 
the modularity M for each school, suggesting that the 
two indicators are not simply correlated. The two indica- 
tors provide different information: loosely speaking, while 
M provides an absolute measure of the excess of inward 



or outward links in a community assignment, 8 measures 
how much the biases in the community assignment is cor- 
related with the network topology. 

In order to substantiate this statement in a visual 
manner, we identify two schools with different values of 
Q/Vn, but similar values of modularity M and Shan- 
non entropy S. Fig. |4] reports the friendship networks in 
the two schools, strongly suggesting that significant dif- 
ferences in 8 imply different degrees of separation be- 
tween the different communities, an effect which is not 
captured by M. This shows that a community assign- 
ment with a given value of the modularity, is more in- 
formative on the network topology when the network is 
strongly clustered in groups, than when the network has 
a less pronounced cluster structure. 



The dataset of a protein-protein interaction network 

We apply the proposed method to the study of the rele- 
vance of the protein abundance on the protein interacting 
map of Saccharomyces cerevisiae. The dataset, published 
in [23|, is a subset of the protein- interaction network of 
Saccharomyces cerevisiae formed by TV = 1, 740 proteins 
with known concentrations Xi and 4, 185 interactions, in- 
dependently confirmed in at least two publications. The 
abundance of a protein varies between 50 molecules per 
cell up to 1,000,000 molecules per cell with a median of 
3,000 molecules per cell. The abundance of a protein is 
not correlated with simple local structural features of the 
protein interaction map, such as the degree (i? — 0.13) 
or the clustering coefficient {R = 0.005). This raises the 
question of whether the concentration of proteins has any 
relevance to the interaction network and if so, what in- 
formation it provides. 

We bin the abundance x into 20 logarithmically 





FIG. 5: Relevance of protein abundance x for the protein- 
protein interaction network studied in [23|]. The statistical 
weight W{x, x') describing the likelihood of links between pro- 
teins with concentrations x and x' is first normalized to the 
analogous function WR{x, x') which is obtained in the ran- 
domized data-set (with a random permutation of the abun- 
dance values Xi). The density plot reports the dependence of 
W{x,x')/WR{x,x') as a function of the protein abundance 



of the specific concentration assignments in the data-set. 
The maximum of W{x,x')/WR{x,x') along the diago- 
nal suggests that proteins of a given concentration tend 
to interact preferentially with proteins with a similar con- 
centration, therefore showing some "assortativity" of the 
interaction map in the plane of the abundance x,x'. 







FIG. 4: The case of two schools with similar modularity and 
Shannon entropy but very different value of O. The top figure 
represents the friendship network of a school of A^i — 1461 
students, average connectivity (k) — 5.3, Shannon entropy 
Si = 0.41, modularity Mi = 0.64 and ei/\/iV = 1.69. The 
bottom figure represents the friendship network of a school 
of A^2 ~ 1147 students, average degree (k) = 8.8, Shannon 
entropy S2 = 0.48, modularity M2 = 0.66 and Q2/VN = 
15.71. The different colors represent the self-reported ethnic 
backgrounds of the students. 



spaced intervals given by the ordered vector x = 
(xo, xi, . . . , a;2o). Next, we assign to each protein i the 
corresponding coarse-grained abundance gi = A; if G 
[xk-i,Xk)- The features of the network that we consider 
are again the connectivity of each protein together with 
the number of links between proteins of different abun- 
dance A{qi,qj). We find a value of 9 = 21.76, well be- 
yond the 1% confidence interval 8 < 2.7, showing that 
the abundance of the protein encodes relevant informa- 
tion on the network structure. In Fig. [5] we report the 
value of the statistical weight W{x,x') in Eq. (|4]) as a 
function of the (log-) abundance of each pair of proteins 
in the network. The value of W{x,x') is normalized to 
the value WR{x, x') found in networks where the protein 
abundance is randomized, in order to highlight features 



APPLICATION TO SPATIAL NETWORKS 

The role of the space in which networks are embed- 
ded, and its implications on navigability and efficiency, 
has attracted considerable interest [2i,[33,[3ll[33. Here 
we show how the proposed indicator Q can be used for 
assessing how relevant the spatial position of the nodes 
in some geographical or abstract metric space is. 

In this case, each node can be characterized by its de- 
gree ki and by its position in space qi. We first define 
a set d S {di, . . . jdjj} [D = 0{N)) of fixed increas- 
ing distance values. We then consider the ensemble of 
networks with given feature 4){g,q) = {k,B(d)}, where 
B{d) = (&i, . . . , bn) is the vector of the total number be 
of links between nodes at distance d — \qi~qj\ e [de-i, de] 
(do = 0) . Finally we calculate the entropy of this ensem- 
ble ^ and the indicator O from the definition of Eq. 



The dataset of US airport networks 

Here we apply the proposed method to the network of 
USA airports studied in (s^]. We find that, has it occur 
for the internet [Sp] , also the airport network is consistent 
with a power-law dependence of the linking probability 



between two nodes, with their distance. The network con- 
tains N = 675 airports and 3, 253 connections, each of 
which is regular flights between two airports. In this case, 
with each airport is associated a geographical location qi. 
We bin the distances into D = 20 logarithmically spaced 
intervals and we consider as features of our graph the de- 
gree sequence k together with B{d), as discussed above. 
We find a high value of G = 1.1 x 10^, showing high 
significance of space in the structure of airport connec- 
tions, as expected. In this case, W{q,q') — W{d{q,q')) is 
a function of the distance only. In Fig. [S] we report the 
shape of the function W = W{d), depending on the dis- 
tance d between any two airports i and j, together with 
the shape of WR{d) in the case in which the positions of 
the airports are randomly reshuffled. The function W{d) 
indicates that the probability of a connection decays ap- 
proximately like a power-law, with deviations for airports 
at distances smaller than 100 km (flights over such small 
distances mainly connect places such as islands or remote 
areas in Alaska, for which charter flights are the only fea- 
sible connection). A log-log fit yields W{d) ^ d^" with 
a = 3.0 ± 0.2 for d > 100 km. 

Networks with a linking probability which depends on 
a power-law of the distance are of special relevance, both 
because they occur in real networks (see e.g. 1291') and, in 
abstract terms, for navigability and efficiency [30ll3ll.l32|. 

A possible interpretation of the reported statistical reg- 
ularity is the following. Imagine that fiights were designed 
by a social planner with the aim of optimizing the net- 
work for an uniformly distributed population of passen- 
gers. This task is similar to that of finding small world 
networks with optimal navigability. Following the pio- 
neering work of Kleinberg [3(|, it has been shown that 
optimal navigability can be achieved in small-world net- 
works where long-range links are drawn from a distribu- 
tion with a <E [2,3] [32] ■ If we suppose that airports are 
uniformly distributed across the country and that flying 
costs have a contribution which increases linearly with 
distance, then an airline company would face costs 
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FIG. 6: The function W{gi, q-j) = W{d) in the US airport net- 
work, which (see Eq.|4]) encodes the statistical weight of a hnk 
between airports at distance d (in km). For comparison, the 
same function is shown for the randomized network in which 
the geographic locations of the airports have been reshuffled. 
The line, which represents an inverse dependence on the cube 
of the distance (a — 3) is drawn as a guide to the eyes. 



CONCLUSION 

In conclusion, we propose a method for assessing the 
relevance of additional information about the nodes of 
a networks using the information that comes from the 
topology of the network itself. The method makes use of a 
new quantity 0, which is not reducible to any other quan- 
tity already introduced in network analysis. The method 
can be generalized to directed or weighted networks. We 
test and illustrate this method on synthetic as well as 
real networks, such as the social network of friendship 
interaction in US schools, the protein interaction map of 
Saccharomyces cerevisiae and the US airport network. As 
a byproduct, the method also provides additional non- 
trivial information and highlights hidden statistical reg- 
ularities. 



DATA 



C{R) cx 27r / dr W{r) cx R^' 
Jr 



to cover distances greater than R ^ 1. With a < 3, 
costs would be dominated by long distance flights. In 
a regime of free competition between airlines, a > 3 is 
essential in order to maintain a diversified portfolio of 
flights over short and long distances. This suggests that 
a « 3 corresponds to the optimal compromise between 
networks with optimal navigability and those which are 
economically viable in a competitive market of airline 
companies. 



The networks of American schools come from the 
National Longitudinal Study of Adolescent Health. 
It consists of data from surveys conducted in 1994 
in a sample of 84 American high schools and mid- 
dle schools by the UNC Carolina Population Center 
( http : / / www. cpc . unc . edu / addhealth ) . 

The protein interaction map that we used 
is based on the BioGRID database 2.0.20 
(http://www.thebiogrid.org). It is described in de- 
tail in [2^ and is freely available as Supplementary 
Material of 

The airport network was recorded by [ssj from the 
2005 statistics of the International Air Transport Asso- 
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ciation (lATA, http://www.iata.orgl and is available at 
,http://cxn ets. googlepages . com , 
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A community structure, in general terms, is an assign- 
ment of nodes into classes. Community detection aims at 
partitioning nodes into homogeneous classes, according 
to similarity or proximity considerations. 
To be precise, here ki = gij is the degree and 
A{q,q') = 'Yli j 9i.3^qi.q^qj,q' number of links be- 

tween nodes with attribute q and q' . 
In other words, E^jg is the Gibbs-Boltzmann entropy of 
the ensemble of graphs which assigns equal weight to each 
graph g satisfying the constraints, which is equivalent to 
the usual Shannon entropy of the distribution of graphs 
in this ensemble. 

A more precise estimate of the probability of occurrence 
of a given value of Q would entail the study of large de- 
viation properties of the entropy distribution. This goes 
beyond our present purposes. 

This limitation is imposed by the fact that the saddle 
point method we use to evaluate the entropy is reliable 
only if the number of imposed constraints N + is of 
the same order of magnitude of N. 

A plausibility argument for the scaling behavior is the fol- 
lowing: Consider a particular permutation vr and imagine 
to make a small number n <^ N of further perturbations, 
by exchanging assignments on pairs of randomly chosen 
nodes. Each such perturbation is likely to affect a differ- 
ent part of the network, which means that the associated 
changes in the entropy can be considered as uncorrelated. 
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Hence we expect a change in the entropy density of the 
order of 'Jn/N. This is expected to hold true also for 
n/N finite but small suggesting that, as A'' increases, the 
difference between the entropies of two random permu- 



tations - and hence the denominator in Eq. (|3]) - is of 
order l/y/N. 



